8587 test erros on pytorch release 2508 on series 50#8770
8587 test erros on pytorch release 2508 on series 50#8770garciadias wants to merge 21 commits intoProject-MONAI:devfrom
Conversation
Signed-off-by: R. Garcia-Dias <rafaelagd@gmail.com>
…en USE_COMPILED=True monai._C (grid_pull) was not compiled with sm_120 (Blackwell) architecture support, causing spatial_resample to produce incorrect results on RTX 50-series GPUs when USE_COMPILED=True. Add _compiled_unsupported() to detect compute capability major >= 12 at runtime and transparently fall back to the PyTorch-native affine_grid + grid_sample path, which is verified correct on sm_120. Fixes test_flips_inverse_124 in tests.transforms.spatial.test_spatial_resampled on NVIDIA GeForce RTX 5090 (Blackwell, sm_120).
The same USE_COMPILED guard that was fixed in spatial_resample (functional.py) was also present in Resample.__call__ (array.py), used by Affine, RandAffine and related transforms. Apply the same _compiled_unsupported() check so that grid_pull is not called on sm_120 (Blackwell) devices when monai._C lacks sm_120 support, preventing garbage output in test_affine, test_affined, test_rand_affine and test_rand_affined on RTX 50-series GPUs.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThe diff mostly standardizes string formatting and minor tuple/whitespace adjustments across many modules. It introduces a private helper Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip CodeRabbit can generate a title for your PR based on the changes with custom instructions.Set the |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
monai/transforms/spatial/array.py (1)
2066-2117:⚠️ Potential issue | 🟠 MajorConvert grid coordinates when falling back to
grid_samplewithnorm_coords=False.When
_compiled_unsupported()returns True, the fallback path at line 2103+ passes grids directly togrid_samplewithout converting from compiled convention[0, size-1]to PyTorch convention[-1, 1]. This causes silent missampling on unsupported devices (Blackwell cc12.x). Add coordinate conversion in the else block or reject thenorm_coords=False+ compiled fallback combination explicitly. Add test coverage for this scenario.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@monai/transforms/spatial/array.py` around lines 2066 - 2117, The fallback branch that calls torch.nn.functional.grid_sample (else block) fails to convert compiled-style grid coordinates [0, size-1] to PyTorch normalized coords [-1, 1] when self.norm_coords is False; update the else branch before calling grid_sample to convert grid_t per-dimension: for each i and corresponding spatial size dim from img_t.shape, compute grid_t[0, ..., i] = (grid_t[0, ..., i] * 2.0 / max(1, dim - 1)) - 1.0 (use max to avoid division by zero) so grid_sample receives normalized coords, and preserve dtype/device/memory_format as done elsewhere (target symbols: _compiled_unsupported, grid_sample, self.norm_coords, grid_t, img_t, moveaxis).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@monai/apps/auto3dseg/bundle_gen.py`:
- Around line 266-268: The error message raised in BundleAlgo._run_cmd uses a
concatenated string missing a space after the period; update the
NotImplementedError text that references self.device_setting['MN_START_METHOD']
so it reads "...not supported yet. Try modify BundleAlgo._run_cmd for your
cluster." (preserve the current exception chaining "from err") to fix the
missing space between sentences.
In `@monai/apps/detection/networks/retinanet_detector.py`:
- Around line 519-522: The error message raised when self.inferer is None
contains a missing space after the period; update the string in the raise
ValueError call to insert a space so it reads "`self.inferer` is not defined.
Please refer to function self.set_sliding_window_inferer(*)" (locate the check
referencing self.inferer and the call to self.set_sliding_window_inferer to
apply the fix).
---
Outside diff comments:
In `@monai/transforms/spatial/array.py`:
- Around line 2066-2117: The fallback branch that calls
torch.nn.functional.grid_sample (else block) fails to convert compiled-style
grid coordinates [0, size-1] to PyTorch normalized coords [-1, 1] when
self.norm_coords is False; update the else branch before calling grid_sample to
convert grid_t per-dimension: for each i and corresponding spatial size dim from
img_t.shape, compute grid_t[0, ..., i] = (grid_t[0, ..., i] * 2.0 / max(1, dim -
1)) - 1.0 (use max to avoid division by zero) so grid_sample receives normalized
coords, and preserve dtype/device/memory_format as done elsewhere (target
symbols: _compiled_unsupported, grid_sample, self.norm_coords, grid_t, img_t,
moveaxis).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: aa93797a-0eea-4904-a754-c937994e99c8
📒 Files selected for processing (14)
monai/apps/auto3dseg/bundle_gen.pymonai/apps/detection/networks/retinanet_detector.pymonai/apps/detection/utils/anchor_utils.pymonai/apps/detection/utils/detector_utils.pymonai/auto3dseg/analyzer.pymonai/data/wsi_reader.pymonai/losses/unified_focal_loss.pymonai/metrics/meandice.pymonai/networks/blocks/patchembedding.pymonai/networks/layers/factories.pymonai/transforms/croppad/array.pymonai/transforms/regularization/array.pymonai/transforms/spatial/array.pymonai/transforms/spatial/functional.py
|
Tolerance tests are passing. 🐧 ❯ export CUDAVERSION=slim; docker run --rm -v ./:/opt/monai/ --name monai_$CUDAVERSION --gpus=all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 -it monai:$CUDAVERSION python -m unittest \
tests.transforms.test_affine \
tests.transforms.test_affined \
tests.transforms.test_affine_grid \
tests.transforms.test_rand_affine \
tests.transforms.test_rand_affine_grid \
tests.transforms.test_rand_affined \
tests.transforms.test_create_grid_and_affine \
tests.transforms.test_spatial_resample \
tests.transforms.spatial.test_spatial_resampled \
tests.networks.layers.test_affine_transform \
tests.data.utils.test_zoom_affine \
tests.integration.test_meta_affine
tf32 enabled: False
monai.transforms.spatial.array Orientation.__init__:labels: Current default value of argument `labels=(('L', 'R'), ('P', 'A'), ('I', 'S'))` was changed in version None from `labels=(('L', 'R'), ('P', 'A'), ('I', 'S'))` to `labels=None`. Default value changed to None meaning that the transform now uses the 'space' of a meta-tensor, if applicable, to determine appropriate axis labels.
monai.transforms.spatial.dictionary Orientationd.__init__:labels: Current default value of argument `labels=(('L', 'R'), ('P', 'A'), ('I', 'S'))` was changed in version None from `labels=(('L', 'R'), ('P', 'A'), ('I', 'S'))` to `labels=None`. Default value changed to None meaning that the transform now uses the 'space' of a meta-tensor, if applicable, to determine appropriate axis labels.
.........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................__array__ implementation doesn't accept a copy keyword, so passing copy=False failed. __array__ must implement 'dtype' and 'copy' keyword arguments. To learn more, see the migration guide https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword
........__array__ implementation doesn't accept a copy keyword, so passing copy=False failed. __array__ must implement 'dtype' and 'copy' keyword arguments. To learn more, see the migration guide https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword
...................................................................................................................................................................................................................................................................................Since version 1.3.0, affine_grid behavior has changed for unit-size grids when align_corners=True. This is not an intended use case of affine_grid. See the documentation of affine_grid for details.
...........................2026-03-09 13:37:04,857 - INFO - Verified 'ref_avg152T1_LR.nii.gz', sha256: c01a50caa7a563158ecda43d93a1466bfc8aa939bc16b06452ac1089c54661c8.
2026-03-09 13:37:04,858 - INFO - File exists: /opt/monai/tests/testing_data/ref_avg152T1_LR.nii.gz, skipped downloading.
2026-03-09 13:37:04,858 - INFO - Verified 'ref_avg152T1_RL.nii.gz', sha256: 8a731128dac4de46ccb2cc60d972b98f75a52f21fb63ddb040ca96f0aed8b51a.
2026-03-09 13:37:04,858 - INFO - File exists: /opt/monai/tests/testing_data/ref_avg152T1_RL.nii.gz, skipped downloading.
builtin type SwigPyPacked has no __module__ attribute
builtin type SwigPyObject has no __module__ attribute
builtin type swigvarlink has no __module__ attribute
...........monai.transforms.spatial.array Orientation.__init__:labels: Current default value of argument `labels=(('L', 'R'), ('P', 'A'), ('I', 'S'))` was changed in version None from `labels=(('L', 'R'), ('P', 'A'), ('I', 'S'))` to `labels=None`. Default value changed to None meaning that the transform now uses the 'space' of a meta-tensor, if applicable, to determine appropriate axis labels.
........................monai.transforms.spatial.dictionary Orientationd.__init__:labels: Current default value of argument `labels=(('L', 'R'), ('P', 'A'), ('I', 'S'))` was changed in version None from `labels=(('L', 'R'), ('P', 'A'), ('I', 'S'))` to `labels=None`. Default value changed to None meaning that the transform now uses the 'space' of a meta-tensor, if applicable, to determine appropriate axis labels.
............
----------------------------------------------------------------------
Ran 910 tests in 12.732s
OK
sys:1: DeprecationWarning: builtin type swigvarlink has no __module__ attribute |
…ytorch-release-2508-on-series-50 DCO Remediation Commit for R. Garcia-Dias <rafaelagd@gmail.com> I, R. Garcia-Dias <rafaelagd@gmail.com>, hereby add my Signed-off-by to this commit: ba56a6d I, R. Garcia-Dias <rafaelagd@gmail.com>, hereby add my Signed-off-by to this commit: 09c2cd9 I, R. Garcia-Dias <rafaelagd@gmail.com>, hereby add my Signed-off-by to this commit: 7cd0607 Signed-off-by: R. Garcia-Dias <rafaelagd@gmail.com>
f64b17a to
356956a
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
monai/apps/detection/networks/retinanet_detector.py (1)
519-522:⚠️ Potential issue | 🟡 MinorMissing space after period.
"is not defined.Please"→"is not defined. Please".Proposed fix
raise ValueError( - "`self.inferer` is not defined.Please refer to function self.set_sliding_window_inferer(*)." + "`self.inferer` is not defined. Please refer to function self.set_sliding_window_inferer(*)." ),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@monai/apps/detection/networks/retinanet_detector.py` around lines 519 - 522, The ValueError message raised when self.inferer is None in retinanet_detector.py contains a missing space after the period; update the string in the check inside the method where self.inferer is validated (the block referencing self.inferer and the suggestion to call self.set_sliding_window_inferer) to read "`self.inferer` is not defined. Please refer to function self.set_sliding_window_inferer(*)`" so there is a space after the period.
🧹 Nitpick comments (1)
monai/transforms/spatial/functional.py (1)
57-76: Function implementation is sound; docstring lacks formal Args/Returns.The logic correctly gates Blackwell GPUs (sm_120+). Consider adding explicit
Args:andReturns:sections to align with Google-style docstrings per project conventions.📝 Suggested docstring format
def _compiled_unsupported(device: torch.device) -> bool: """ Return True if ``monai._C`` (the compiled C extension providing ``grid_pull``) is not compiled with support for the given CUDA device's compute capability. - + + Args: + device: The torch device to check for compiled extension support. + + Returns: + True if the device is CUDA with compute capability major >= 12 (Blackwell+), + False otherwise. + ``monai._C`` is built at install time against a fixed set of CUDA architectures.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@monai/transforms/spatial/functional.py` around lines 57 - 76, The _compiled_unsupported function's docstring is missing formal Google-style "Args:" and "Returns:" sections; update the docstring for _compiled_unsupported(device: torch.device) to include an "Args:" entry describing the device parameter (type and semantics) and a "Returns:" entry stating it returns a bool indicating whether the compiled monai._C lacks support for the device (True for CUDA devices with major compute capability >= 12, False otherwise), keeping the existing explanatory text intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@monai/transforms/spatial/functional.py`:
- Around line 57-76: Warp.forward currently calls grid_pull when USE_COMPILED is
true without checking whether the compiled C extension supports the device;
update Warp.forward to compute _use_compiled = USE_COMPILED and not
_compiled_unsupported(image.device) (reuse the existing _compiled_unsupported
function from functional.py) and branch: if not _use_compiled use the PyTorch
native affine_grid + grid_sample path, else call grid_pull as before, ensuring
grid_pull is only invoked on supported CUDA devices (reference symbols:
Warp.forward, USE_COMPILED, _compiled_unsupported, grid_pull).
---
Duplicate comments:
In `@monai/apps/detection/networks/retinanet_detector.py`:
- Around line 519-522: The ValueError message raised when self.inferer is None
in retinanet_detector.py contains a missing space after the period; update the
string in the check inside the method where self.inferer is validated (the block
referencing self.inferer and the suggestion to call
self.set_sliding_window_inferer) to read "`self.inferer` is not defined. Please
refer to function self.set_sliding_window_inferer(*)`" so there is a space after
the period.
---
Nitpick comments:
In `@monai/transforms/spatial/functional.py`:
- Around line 57-76: The _compiled_unsupported function's docstring is missing
formal Google-style "Args:" and "Returns:" sections; update the docstring for
_compiled_unsupported(device: torch.device) to include an "Args:" entry
describing the device parameter (type and semantics) and a "Returns:" entry
stating it returns a bool indicating whether the compiled monai._C lacks support
for the device (True for CUDA devices with major compute capability >= 12, False
otherwise), keeping the existing explanatory text intact.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a83bf9dd-d11c-4601-9888-15b532b84f0b
📒 Files selected for processing (7)
monai/apps/auto3dseg/bundle_gen.pymonai/apps/detection/networks/retinanet_detector.pymonai/auto3dseg/analyzer.pymonai/data/wsi_reader.pymonai/transforms/regularization/array.pymonai/transforms/spatial/array.pymonai/transforms/spatial/functional.py
💤 Files with no reviewable changes (1)
- monai/transforms/regularization/array.py
🚧 Files skipped from review as they are similar to previous changes (2)
- monai/data/wsi_reader.py
- monai/apps/auto3dseg/bundle_gen.py
======================================================================
ERROR: test_dataloading_img_label_0 (tests.data.test_arraydataset.TestArrayDataset.test_dataloading_img_label_0)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/parameterized/parameterized.py", line 620, in standalone_func
return func(*(a + p.args), **p.kwargs, **kw)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/monai/tests/data/test_arraydataset.py", line 184, in test_dataloading_img_label
new_data = next(iter(loader)) # test batching
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/utils/data/dataloader.py", line 741, in __next__
data = self._next_data()
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/utils/data/dataloader.py", line 1548, in _next_data
return self._process_data(data, worker_id)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/utils/data/dataloader.py", line 1586, in _process_data
data.reraise()
File "/usr/local/lib/python3.11/dist-packages/torch/_utils.py", line 775, in reraise
raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/torch/utils/data/_utils/worker.py", line 358, in _worker_loop
data = fetcher.fetch(index) # type: ignore[possibly-undefined]
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/utils/data/_utils/fetch.py", line 57, in fetch
return self.collate_fn(data)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/utils/data/_utils/collate.py", line 401, in default_collate
return collate(batch, collate_fn_map=default_collate_fn_map)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/utils/data/_utils/collate.py", line 214, in collate
return [
^
File "/usr/local/lib/python3.11/dist-packages/torch/utils/data/_utils/collate.py", line 215, in <listcomp>
collate(samples, collate_fn_map=collate_fn_map)
File "/usr/local/lib/python3.11/dist-packages/torch/utils/data/_utils/collate.py", line 155, in collate
return collate_fn_map[elem_type](batch, collate_fn_map=collate_fn_map)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/monai/monai/data/utils.py", line 426, in collate_meta_tensor_fn
collated = collate_tensor_fn(batch)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/utils/data/_utils/collate.py", line 273, in collate_tensor_fn
storage = elem._typed_storage()._new_shared(numel, device=elem.device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/storage.py", line 1201, in _new_shared
untyped_storage = torch.UntypedStorage._new_shared(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/storage.py", line 414, in _new_shared
return cls._new_using_fd_cpu(size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: unable to allocate shared memory(shm) for file </torch_2854_386661935_41>: Success (0)
======================================================================
ERROR: test_multiprocess_mapping_file (tests.data.test_mapping_file.TestWriteFileMapping.test_multiprocess_mapping_file)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/opt/monai/tests/data/test_mapping_file.py", line 102, in test_multiprocess_mapping_file
for _ in multi_loader:
File "/usr/local/lib/python3.11/dist-packages/torch/utils/data/dataloader.py", line 741, in __next__
data = self._next_data()
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/utils/data/dataloader.py", line 1548, in _next_data
return self._process_data(data, worker_id)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/utils/data/dataloader.py", line 1586, in _process_data
data.reraise()
File "/usr/local/lib/python3.11/dist-packages/torch/_utils.py", line 775, in reraise
raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 1.
Original Traceback (most recent call last):
File "/opt/monai/monai/data/utils.py", line 484, in list_data_collate
ret = collate_fn(data)
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/utils/data/_utils/collate.py", line 401, in default_collate
return collate(batch, collate_fn_map=default_collate_fn_map)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/utils/data/_utils/collate.py", line 155, in collate
return collate_fn_map[elem_type](batch, collate_fn_map=collate_fn_map)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/monai/monai/data/utils.py", line 426, in collate_meta_tensor_fn
collated = collate_tensor_fn(batch)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/utils/data/_utils/collate.py", line 273, in collate_tensor_fn
storage = elem._typed_storage()._new_shared(numel, device=elem.device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/storage.py", line 1201, in _new_shared
untyped_storage = torch.UntypedStorage._new_shared(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/storage.py", line 414, in _new_shared
return cls._new_using_fd_cpu(size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: unable to allocate shared memory(shm) for file </torch_3126_4237481189_0>: Success (0)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/torch/utils/data/_utils/worker.py", line 358, in _worker_loop
data = fetcher.fetch(index) # type: ignore[possibly-undefined]
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/utils/data/_utils/fetch.py", line 57, in fetch
return self.collate_fn(data)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/monai/monai/data/utils.py", line 497, in list_data_collate
raise RuntimeError(re_str) from re
RuntimeError: unable to allocate shared memory(shm) for file </torch_3126_4237481189_0>: Success (0)
----------------------------------------------------------------------
Ran 17226 tests in 844.247s
FAILED (errors=2, skipped=1024) |
- Import _compiled_unsupported from spatial.functional - Check device compatibility before using grid_pull - Fall back to PyTorch grid_sample for unsupported CUDA devices - Prevents silent failures on Blackwell GPUs and future architectures
- Add coordinate conversion from [0, size-1] to [-1, 1] convention - Implement fallback to PyTorch grid_sample when compiled extension unavailable - Prevents misaligned output on Blackwell GPUs and unsupported architectures - Ensures correct behavior when _compiled_unsupported() returns True
- Add Args section documenting device parameter - Add Returns section describing return values - Add Note section with detailed explanation - Clarify compute capability threshold and rebuild implications
- bundle_gen.py: Add space after period in NotImplementedError message - retinanet_detector.py: Add space after period in ValueError message - Improves readability and follows English punctuation standards
- TestCompiledUnsupported: Verify device detection logic - TestResampleFallback: Verify fallback behavior on unsupported devices - Tests for CPU, CUDA, and non-CUDA device handling - Uses unittest framework only (no pytest dependency) - All tests pass on current supported architectures
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
monai/networks/blocks/warp.py (1)
142-154:⚠️ Potential issue | 🔴 CriticalMode type mismatch causes runtime failure on unsupported devices.
When
USE_COMPILED=Trueat instantiation,self._interp_modeandself._padding_modeare integers (e.g.,1,7). If_compiled_unsupported()returnsTrueat runtime (Blackwell GPU), the code falls back toF.grid_samplewhich requires string modes ("bilinear","zeros"). Passing integers will raiseRuntimeError.🐛 Proposed fix: store both mode representations
def __init__(self, mode=GridSampleMode.BILINEAR.value, padding_mode=GridSamplePadMode.BORDER.value, jitter=False): ... super().__init__() - # resolves _interp_mode for different methods - + # Store native (string) mode for fallback path + self._native_interp_mode = GridSampleMode(mode).value + self._native_padding_mode = GridSamplePadMode(padding_mode).value + + # resolves _interp_mode for compiled path if USE_COMPILED: if mode in (inter.value for inter in GridSampleMode): mode = GridSampleMode(mode) if mode == GridSampleMode.BILINEAR: mode = 1 elif mode == GridSampleMode.NEAREST: mode = 0 elif mode == GridSampleMode.BICUBIC: mode = 3 else: mode = 1 # default to linear self._interp_mode = mode else: - warnings.warn("monai.networks.blocks.Warp: Using PyTorch native grid_sample.") - self._interp_mode = GridSampleMode(mode).value + self._interp_mode = self._native_interp_mode # resolves _padding_mode for different methods if USE_COMPILED: ... self._padding_mode = padding_mode else: - self._padding_mode = GridSamplePadMode(padding_mode).value + self._padding_mode = self._native_padding_modeThen in
forward:if not _use_compiled: # pytorch native grid_sample for i, dim in enumerate(grid.shape[1:-1]): grid[..., i] = grid[..., i] * 2 / (dim - 1) - 1 index_ordering: list[int] = list(range(spatial_dims - 1, -1, -1)) grid = grid[..., index_ordering] # z, y, x -> x, y, z return F.grid_sample( - image, grid, mode=self._interp_mode, padding_mode=f"{self._padding_mode}", align_corners=True + image, grid, mode=self._native_interp_mode, padding_mode=self._native_padding_mode, align_corners=True )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@monai/networks/blocks/warp.py` around lines 142 - 154, The fallback path to F.grid_sample fails when self._interp_mode and self._padding_mode are stored as integers by the compiled path; update the forward logic around the _use_compiled check (where USE_COMPILED and _compiled_unsupported are used) to convert the integer modes to the string names expected by torch.nn.functional.grid_sample (e.g., map numeric self._interp_mode to "bilinear"/"nearest" and numeric self._padding_mode to "zeros"/"border"/"reflection") before calling F.grid_sample; keep the compiled-path call to grid_pull unchanged and reuse any existing mappings or create a minimal mapping helper near the class (referencing self._interp_mode, self._padding_mode, F.grid_sample, and grid_pull) so runtime fallback on unsupported devices uses string mode values.
🧹 Nitpick comments (3)
tests/transforms/test_spatial_gpu_support.py (3)
26-34: Redundant test:test_non_cuda_device_always_supportedduplicatestest_cpu_device_always_supported.Both tests create a CPU device and assert the same condition. Either remove the duplicate or expand the second test to cover other non-CUDA device types (e.g., "mps" if relevant).
♻️ Suggested fix
- def test_non_cuda_device_always_supported(self): - """Non-CUDA devices should always be supported.""" - device = torch.device("cpu") - self.assertFalse(_compiled_unsupported(device))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/transforms/test_spatial_gpu_support.py` around lines 26 - 34, The test test_non_cuda_device_always_supported is a duplicate of test_cpu_device_always_supported; remove the redundant test or replace it to cover other non-CUDA device types (e.g., create torch.device("mps") and assert False from _compiled_unsupported(device) when MPS is relevant), ensuring you keep one explicit CPU check (test_cpu_device_always_supported) and either delete test_non_cuda_device_always_supported or modify it to assert behavior for other non-CUDA devices using the same helper _compiled_unsupported.
65-77: Consider overlap withTestCompiledUnsupported.test_cuda_device_detection.This test duplicates CUDA detection logic from line 36-47. If both are intended, consolidate; otherwise pick one location.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/transforms/test_spatial_gpu_support.py` around lines 65 - 77, The test test_compiled_unsupported_logic duplicates CUDA detection logic already present in TestCompiledUnsupported.test_cuda_device_detection; remove the duplication by either deleting test_compiled_unsupported_logic or merge its assertions into TestCompiledUnsupported.test_cuda_device_detection, keeping the validation of _compiled_unsupported(cuda_device) vs. cc_major >= 12 under a single test; ensure the helper _compiled_unsupported is exercised exactly once for CUDA detection to avoid redundant coverage.
36-47: Unnecessary conditionals inside skip-guarded test.Line 39's ternary and line 40's
if device.type == "cuda"are redundant—the@unittest.skipIfalready guarantees CUDA availability. Simplify:♻️ Suggested simplification
`@unittest.skipIf`(not torch.cuda.is_available(), reason="CUDA not available") def test_cuda_device_detection(self): """Verify CUDA compute capability detection.""" - device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") - if device.type == "cuda": - cc_major = torch.cuda.get_device_properties(device).major - unsupported = _compiled_unsupported(device) - # Device is unsupported if cc_major >= 12 - if cc_major >= 12: - self.assertTrue(unsupported) - else: - self.assertFalse(unsupported) + device = torch.device("cuda:0") + cc_major = torch.cuda.get_device_properties(device).major + unsupported = _compiled_unsupported(device) + # Device is unsupported if cc_major >= 12 + if cc_major >= 12: + self.assertTrue(unsupported) + else: + self.assertFalse(unsupported)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/transforms/test_spatial_gpu_support.py` around lines 36 - 47, The test_cuda_device_detection contains redundant checks guarded by `@unittest.skipIf`(torch.cuda.is_available()), so remove the ternary and inner if: set device directly to torch.device("cuda:0") (no fallback), then call torch.cuda.get_device_properties(device).major and _compiled_unsupported(device) and assert based on cc_major >= 12; update references to device, torch.cuda.is_available, and _compiled_unsupported accordingly to eliminate the needless conditional branches.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/transforms/test_spatial_gpu_support.py`:
- Around line 80-83: Remove the duplicated module entry point: there are two
identical "if __name__ == \"__main__\": unittest.main()" blocks in
tests/transforms/test_spatial_gpu_support.py—delete the extra one so only a
single main invocation remains at the end of the file.
- Around line 59-63: The test method test_resample_compilation_flag_respected is
empty and will always pass; either implement a real check or explicitly
skip/remove it. Fix by implementing a unit test that verifies Resample respects
the _compiled_unsupported flag (e.g., mock device properties or simulate a
Blackwell GPU and assert Resample raises/skips compilation) or replace the body
with unittest.skip("TODO: implement test for _compiled_unsupported") or raise
unittest.SkipTest with a clear reason; target the test method name
test_resample_compilation_flag_respected and any helpers used to construct
Resample/mocked device properties.
---
Outside diff comments:
In `@monai/networks/blocks/warp.py`:
- Around line 142-154: The fallback path to F.grid_sample fails when
self._interp_mode and self._padding_mode are stored as integers by the compiled
path; update the forward logic around the _use_compiled check (where
USE_COMPILED and _compiled_unsupported are used) to convert the integer modes to
the string names expected by torch.nn.functional.grid_sample (e.g., map numeric
self._interp_mode to "bilinear"/"nearest" and numeric self._padding_mode to
"zeros"/"border"/"reflection") before calling F.grid_sample; keep the
compiled-path call to grid_pull unchanged and reuse any existing mappings or
create a minimal mapping helper near the class (referencing self._interp_mode,
self._padding_mode, F.grid_sample, and grid_pull) so runtime fallback on
unsupported devices uses string mode values.
---
Nitpick comments:
In `@tests/transforms/test_spatial_gpu_support.py`:
- Around line 26-34: The test test_non_cuda_device_always_supported is a
duplicate of test_cpu_device_always_supported; remove the redundant test or
replace it to cover other non-CUDA device types (e.g., create
torch.device("mps") and assert False from _compiled_unsupported(device) when MPS
is relevant), ensuring you keep one explicit CPU check
(test_cpu_device_always_supported) and either delete
test_non_cuda_device_always_supported or modify it to assert behavior for other
non-CUDA devices using the same helper _compiled_unsupported.
- Around line 65-77: The test test_compiled_unsupported_logic duplicates CUDA
detection logic already present in
TestCompiledUnsupported.test_cuda_device_detection; remove the duplication by
either deleting test_compiled_unsupported_logic or merge its assertions into
TestCompiledUnsupported.test_cuda_device_detection, keeping the validation of
_compiled_unsupported(cuda_device) vs. cc_major >= 12 under a single test;
ensure the helper _compiled_unsupported is exercised exactly once for CUDA
detection to avoid redundant coverage.
- Around line 36-47: The test_cuda_device_detection contains redundant checks
guarded by `@unittest.skipIf`(torch.cuda.is_available()), so remove the ternary
and inner if: set device directly to torch.device("cuda:0") (no fallback), then
call torch.cuda.get_device_properties(device).major and
_compiled_unsupported(device) and assert based on cc_major >= 12; update
references to device, torch.cuda.is_available, and _compiled_unsupported
accordingly to eliminate the needless conditional branches.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 4e55f54c-4323-4d1d-9f33-104f1e2e6279
📒 Files selected for processing (19)
monai/apps/auto3dseg/bundle_gen.pymonai/apps/detection/networks/retinanet_detector.pymonai/apps/detection/transforms/box_ops.pymonai/auto3dseg/analyzer.pymonai/bundle/scripts.pymonai/data/dataset.pymonai/handlers/utils.pymonai/metrics/utils.pymonai/networks/blocks/warp.pymonai/transforms/io/array.pymonai/transforms/spatial/array.pymonai/transforms/spatial/functional.pymonai/transforms/utility/array.pytests/integration/test_loader_semaphore.pytests/profile_subclass/profiling.pytests/profile_subclass/pyspy_profiling.pytests/transforms/croppad/test_pad_nd_dtypes.pytests/transforms/test_spatial_gpu_support.pyversioneer.py
✅ Files skipped from review due to trivial changes (15)
- tests/profile_subclass/pyspy_profiling.py
- tests/integration/test_loader_semaphore.py
- monai/transforms/io/array.py
- tests/profile_subclass/profiling.py
- tests/transforms/croppad/test_pad_nd_dtypes.py
- monai/handlers/utils.py
- versioneer.py
- monai/metrics/utils.py
- monai/auto3dseg/analyzer.py
- monai/apps/detection/transforms/box_ops.py
- monai/bundle/scripts.py
- monai/apps/auto3dseg/bundle_gen.py
- monai/apps/detection/networks/retinanet_detector.py
- monai/data/dataset.py
- monai/transforms/utility/array.py
🚧 Files skipped from review as they are similar to previous changes (2)
- monai/transforms/spatial/functional.py
- monai/transforms/spatial/array.py
| @unittest.skipIf(not torch.cuda.is_available(), reason="CUDA not available") | ||
| def test_resample_compilation_flag_respected(self): | ||
| """Verify Resample respects _compiled_unsupported check.""" | ||
| # This would require internal inspection or output verification | ||
| # Could test with mock device properties or actual Blackwell GPU |
There was a problem hiding this comment.
Empty test method provides no coverage.
test_resample_compilation_flag_respected has no assertions or body—it will pass silently. Either implement the test (mock device properties or check output behavior) or remove the placeholder with a TODO issue.
♻️ Option: Skip explicitly with reason
`@unittest.skipIf`(not torch.cuda.is_available(), reason="CUDA not available")
+ `@unittest.skip`("TODO: implement with mock device properties or Blackwell GPU")
def test_resample_compilation_flag_respected(self):
"""Verify Resample respects _compiled_unsupported check."""
- # This would require internal inspection or output verification
- # Could test with mock device properties or actual Blackwell GPU
+ pass🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/transforms/test_spatial_gpu_support.py` around lines 59 - 63, The test
method test_resample_compilation_flag_respected is empty and will always pass;
either implement a real check or explicitly skip/remove it. Fix by implementing
a unit test that verifies Resample respects the _compiled_unsupported flag
(e.g., mock device properties or simulate a Blackwell GPU and assert Resample
raises/skips compilation) or replace the body with unittest.skip("TODO:
implement test for _compiled_unsupported") or raise unittest.SkipTest with a
clear reason; target the test method name
test_resample_compilation_flag_respected and any helpers used to construct
Resample/mocked device properties.
| if __name__ == "__main__": | ||
| unittest.main() | ||
| if __name__ == "__main__": | ||
| unittest.main() |
There was a problem hiding this comment.
Duplicated if __name__ == "__main__" block.
Copy-paste error—remove the duplicate.
🐛 Fix
if __name__ == "__main__":
unittest.main()
-if __name__ == "__main__":
- unittest.main()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if __name__ == "__main__": | |
| unittest.main() | |
| if __name__ == "__main__": | |
| unittest.main() | |
| if __name__ == "__main__": | |
| unittest.main() |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/transforms/test_spatial_gpu_support.py` around lines 80 - 83, Remove
the duplicated module entry point: there are two identical "if __name__ ==
\"__main__\": unittest.main()" blocks in
tests/transforms/test_spatial_gpu_support.py—delete the extra one so only a
single main invocation remains at the end of the file.
I, R. Garcia-Dias <rafaelagd@gmail.com>, hereby add my Signed-off-by to this commit: 8de64af I, R. Garcia-Dias <rafaelagd@gmail.com>, hereby add my Signed-off-by to this commit: 7c2ddb6 I, R. Garcia-Dias <rafaelagd@gmail.com>, hereby add my Signed-off-by to this commit: cfe5524 I, R. Garcia-Dias <rafaelagd@gmail.com>, hereby add my Signed-off-by to this commit: 1dec216 I, R. Garcia-Dias <rafaelagd@gmail.com>, hereby add my Signed-off-by to this commit: ec8bf1f Signed-off-by: R. Garcia-Dias <rafaelagd@gmail.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/transforms/test_spacing.py (1)
34-37: Missing docstring on helper function.Per coding guidelines, definitions should have docstrings describing parameters and return values.
📝 Suggested docstring
def _template_5_expected_output(device: torch.device) -> torch.Tensor: + """Return expected output tensor for template 5 based on device capabilities. + + Args: + device: Target torch device to check for compiled extension support. + + Returns: + _TEMPLATE_5_COMPILED if compiled path is active, _TEMPLATE_5_NATIVE otherwise. + """ if USE_COMPILED and not _compiled_unsupported(device): return _TEMPLATE_5_COMPILED return _TEMPLATE_5_NATIVE🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/transforms/test_spacing.py` around lines 34 - 37, Add a docstring to the helper function _template_5_expected_output that briefly describes what the function returns, documents the device parameter (type: torch.device) and the return value (torch.Tensor), and notes behavior differences when USE_COMPILED is true and _compiled_unsupported(device) is checked; place the docstring immediately under the def line for _template_5_expected_output.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/transforms/test_spacing.py`:
- Around line 34-37: Add a docstring to the helper function
_template_5_expected_output that briefly describes what the function returns,
documents the device parameter (type: torch.device) and the return value
(torch.Tensor), and notes behavior differences when USE_COMPILED is true and
_compiled_unsupported(device) is checked; place the docstring immediately under
the def line for _template_5_expected_output.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 143d48d8-9830-4c5f-b6dc-584e8fedbffd
📒 Files selected for processing (7)
monai/apps/detection/transforms/dictionary.pymonai/apps/detection/utils/anchor_utils.pymonai/apps/reconstruction/transforms/array.pymonai/bundle/utils.pymonai/losses/dice.pymonai/losses/focal_loss.pytests/transforms/test_spacing.py
✅ Files skipped from review due to trivial changes (6)
- monai/bundle/utils.py
- monai/losses/focal_loss.py
- monai/apps/detection/utils/anchor_utils.py
- monai/apps/reconstruction/transforms/array.py
- monai/losses/dice.py
- monai/apps/detection/transforms/dictionary.py
Fixes #8587.
Description
NVIDIA Blackwell GPUs (compute capability 12.x, sm_120) are not included in the TORCH_CUDA_ARCH_LIST used when building the MONAI compiled C extension (monai._C). When USE_COMPILED=True, calling grid_pull on a Blackwell device produces incorrect results because the extension was not compiled for that architecture.
This PR adds a _compiled_unsupported(device) helper in monai/transforms/spatial/functional.py that detects whether the current CUDA device's compute capability is unsupported by the compiled extension (major >= 12). When detected, both spatial_resample and the Resample class fall back to the PyTorch-native affine_grid + grid_sample path, giving correct output on all architectures. The behaviour on all previously supported GPUs (sm_75 through sm_90) is unchanged.
Types of changes
Non-breaking change (fix or new feature that would not break existing functionality).
Breaking change (fix or new feature that would cause existing functionality to change).
New tests added to cover the changes.
Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
In-line docstrings updated.
Documentation updated, tested make html command in the docs/ folder.
./runtests.sh -f -u --net --coverage../runtests.sh --quick --unittests --disttests.make htmlcommand in thedocs/folder.